📚 node [[gated recurrent unit|grus]]
Welcome! Nobody has contributed anything to 'gated recurrent unit|grus' yet. You can:
  • Write something in the document below!
    • There is at least one public document in every node in the Agora. Whatever you write in it will be integrated and made available for the next visitor to read and edit.
  • Write to the Agora from social media.
    • If you follow Agora bot on a supported platform and include the wikilink [[gated recurrent unit|grus]] in a post, the Agora will link it here and optionally integrate your writing.
  • Sign up as a full Agora user.
    • As a full user you will be able to contribute your personal notes and resources directly to this knowledge commons. Some setup required :)
⥅ related node [[gated recurrent unit]]
⥅ node [[gated-recurrent-unit]] pulled by Agora

Gated Recurrent Unit - GRU

Go back to [[Week 2 - Introduction]] or the [[Main AI Page]] Part of the pages on [[Artificial Intelligence/Week 2/Natural Language Processing]] and [[Attention Mechanism]].

According to the Illustrated Guide to LSTMs and GRUs - A step-by-step guide

The GRU is the newer generation of Recurrent Neural networks and is pretty similar to an LSTM. GRU’s got rid of the cell state and used the hidden state to transfer information. It also only has two gates, a reset gate and update gate.

Though Wikipedia does mention that:

GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets.[6][7]

However, as shown by Gail Weiss, Yoav Goldberg and Eran Yahav, the LSTM is "strictly stronger" than the GRU as it can easily perform unbounded counting, while the GRU cannot. That's why the GRU fails to learn simple languages that are learnable by the LSTM.[8]

Similarly, as shown by Denny Britz, Anna Goldie, Minh-Thang Luong and Quoc Le of Google Brain, LSTM cells consistently outperform GRU cells in "the first large-scale analysis of architecture variations for Neural Machine Translation."[9] 

A GRU and its gates

See my notes on [[Long short-term memory - LSTM]] for an in-depth guide on how these work.

A video on how LSTMs work

An LSTM and a GRU side-by-side

📖 stoas
⥱ context